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Abstract. Modern construction of uniform confidence bands for non- 
parametric densities (and other functions) often relies on the Smirnov- 
Bickel- Rosenblatt (SBR) condition; see e.g. Gine and Nickl (2010). This 
condition requires existence of a limit distribution of an extreme value 
type for a supremum of a studentized empirical process (equivalently, for 
a supremum of a Gaussian process with an equivalent covariance kernel). 
The principal contribution of this paper is to remove the need for SBR 
condition. We show that a weaker sufficient condition is the anticoncen- 
tration inequality for the supremum of the approximating Gaussian pro- 
cess, and we derive such an inequality under weak assumptions. Our new 
result shows that the supremum does not concentrate too fast around 
its expected value. We then apply this result to derive a Gaussian boot- 
strap procedure for constructing honest and adaptive confidence bands 
for nonparametric density estimators, completely avoiding the need for 
SBR condition. An essential advantage of our approach is that it ap- 
plies even in those cases where the limit distribution does not exist (or 
is unknown). Furthermore, our approach provides an approximation 
to the exact finite sample distribution with an error that converges to 
zero at a fast, polynomial speed (with respect to the sample size). In 
sharp contrast, the Smirnov-Bickel- Rosenblatt approach provides an ap- 
proximation with an error that converges to zero at a slow, logarithmic 
speed. 



1. Introduction 

Let Xi, . . . , Xn be i.i.d. random variables with common unknown density 
/ on W^. We are interested in constructing confidence bands for / on a 
subset X CW^ that are honest and adaptive to a given class J- of densities 
on R"'. Typically, X is a compact set on which / is bounded away from zero, 
and J-" is a class of smooth densities such as a subset of a Holder ball. A 
confidence band C„ = C„(Xi, . . . , X„) is a family of random intervals 

Cn ■■= {Cn{x) = [cl{x),cu{x)] : X £ X} 
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that contains the graph of f on X with a guaranteed probabihty. Fohowing 
[l9| . a band Cn is said to be asymptotically honest with level a G (0, 1) for 
the class J- if 

hminf inf P. (/(x) G C„(x),Vx e ;f ) > 1 - a. 

Let /n(',0 tie a generic estimator of / with a smoothing parameter /, 
say bandwidth or resolution level, where / is chosen from a candidate set 
Cn- Let In = ln{Xi, . . . , X^) be a possibly data-dependent choice of / in £„. 
Denote by anjix,l) the standard deviation of ^ynfn{x,l), i.e., (Jnj{x,l) := 



n 



var f {fn{x , I)) . Then, we consider a confidence band of the form 



Cnix) 



£ r 7 \ I \'^nj{x,ln) 7. , f \ , i •.'^n,f{x,ln) 
Jn[X, In) - C{a) ^—7^ , fn{x, In) + c{a) -75 



(LI) 



where c(a) is a (possibly data-dependent) critical value determined to make 
the confidence band to have level a. Generally, anj{x,l) is unknown and 
has to be replaced by an estimator. 

A crucial point in construction of confidence bands is the computation of 
the critical value c(a). Assuming that c7„j(x, I) is positive on X x Cn, define 
the stochastic process 

y f . y I n ^/^(A(x,0-E/[/n(x,;)]) 

Znj{v) := Znjix, I) := (L2) 

(^n,f[X, 1) 

for V = {x,l) G X X Cn ='■ Vn- We refer to Znj as a "studentized process". 
If, for the sake of simplicity, the bias |/(x) — Ej[/„(x, /)]j^;- | is sufficiently 
small compared to cj„j(a;,/„), then 

P/{/(x) G C„(x),Vx GX]^Vf fsup Znj{xX) < c{a)] 



> Ff sup \Zn,f{v)\ < c{a] 
\veVn 

so that the band (jl.ip will be of level a £ (0, 1) by taking 

c(a) = (1 - a)-quantile of ||.^n,/||v„ := sup \Znj{v)\, (1.3) 

veVn 

which is, however, infeasible since the finite sample distribution of the pro- 
cess Znj is unknown. Instead, we estimate the (1 — a)-quantile of ||.^n,/||v„- 
Suppose that one can construct a sequence of random variables j 
that are equal in distribution to the suprema of zero mean tight Gaussian 
processes Gnj indexed by Vn with known or estimable covariance structure 

(^n/ ^ l|G'n,/||v„)! ^-iicl such that ||.^n,/||v„ is close to J. Then, we may 
approximate the (1 — a)-quantile of ||-^n,/||v„ by 

Cnjia) := (1 — a)-quantile of ||G„,/||v„- 
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Typically, one computes or approximates c„j(a) by one of two methods 
below. 

1. Analytical method: derive analytically an approximated value of 
c„j(a), by using an explicit limit distribution or large deviation 
inequalities. 

2. Simulation method: simulate the Gaussian process Gnj to compute 
Cnj{a) numerically, by using, for example, a multiplier method. 

The main purpose of the paper is to introduce a general approach to 
establishing the validity of so-constructed confidence bands. Importantly, 
our analysis does not rely on the existence of an explicit (continuous) limit 
distribution of any kind, which is a major difference from the previous lit- 
erature. If, for some normalizing constants bn and dn, dn{\\Gnj\\vn — b^) 
has a continuous limit distribution, the validity of confidence bands con- 
structed would follow via the continuity of the limit distribution. For the 
density estimation problem, if is a singleton, i.e., the bandwidth or res- 
olution level is chosen deterministically, the existence of such a continuous 
limit distribution, which is typically a Gumbel distribution, has been estab- 
lished for kernel density estimators and some wavelet density estimators [see 
m, Q, [m, Q, S, 0, [l3] ■ We refer to the existence of the limit distribution as 
Smirnov-Bickel-Rosenblatt (SBR) condition. However, SBR condition has 
not been obtained for other density estimators such as those based on pro- 
jection kernels with orthogonal polynomials and trigonometric functions. In 
addition, to guarantee the existence of a continuous limit distribution often 
requires more stringent regularity conditions than a Gaussian approxima- 
tion itself. More importantly, if £„ is not a singleton, which is typically 
the case when is data-dependent, and so the randomness of /„ has to be 
taken into account, it is often hard to determine an exact limit behavior of 

\\Gnj\\v„- 

We thus take a different route. Our key ingredient is the anti- concentration 
property of suprema of Gaussian processes. In studying the effect of approx- 
imation and estimation errors on the coverage probability, it is required to 
know how random variable ||G„j||v„ := sup^gy^ \Gnj{v)\ concentrates or 
"anti-concentrates" around, say, its (1 — a)-quantile. It is not difficult to 
see that ||G„j||v„ itself has a continuous distribution, so that with keeping n 
fixed, the probability that ||G„j||v„ falls into the interval with center c„j(a) 
and radius e goes to as e —t- 0. However, what we need to know is the be- 
havior of those probabilities when e is n-dependent and e = e„ — ?• 0. In other 
words, bounding explicitly "anti-concentration" probabilities for suprema of 
Gaussian processes is desirable. We will first establish bounds on the Levy 
concentration function (see Definition 12. ip for suprema of Gaussian processes 
and use these bounds to quantify the effect of approximation and estimation 
errors on the finite sample coverage probability. 

1.1. Related references. Confidence bands in nonparametric estimation 
have been extensively studied in the literature. A classical approach, which 
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goes back to [25,] and [1] , is to use explicit limit distributions of normalized 
suprema of studentized processes. A "Smirnov-Bickel-Rosenblatt type limit 
theorem" combines Gaussian approximation techniques and extreme value 
theory for Gaussian processes. It was argued that the convergence to normal 
extremes is considerably slow despite that the Gaussian approximation is rel- 
atively fast [iB]. To improve the finite sample coverage, bootstrap is often 
used in nonparametric estimation [seed, 0]. However, to establish the valid- 
ity of bootstrap confidence bands, they relied on the existence of continuous 
limit distributions of normalized suprema of original studentized processes. 
In the deconvolution density estimation problem, (2ol | considered confidence 
bands without using Gaussian approximation. In the current density es- 
timation problem, their idea reads as bounding the deviation probability 
of Wfn — E[/n(')]lloo by using Talagrand's [26[ inequality and replacing the 
expected supremum by the Rademacher average. Such a construction is 
indeed general and applicable to many other problems, but is likely to be 
more conservative than our construction. 

1.2. Organization of the paper. In the next section, we give a new anti- 
concentration inequality for suprema of Gaussian processes. Section [3] de- 
scribes two new coupling inequalities. Together, Sections [2] and [3] provide 
powerful tools for proving validity of bootstrap methods to estimate dis- 
tributions of suprema of empirical processes of VC type function classes. 
Section |4] contains theory of honest and adaptive confidence band construc- 
tion under high level conditions. These conditions are easily satisfied both 
for convolution and projection kernel techniques under mild assumptions. 
Section [5] gives primitive sufficient conditions. Finally, all proofs are con- 
tained in the Appendix. 

1.3. Notation. In what follows, constants c, C, ci, Ci, C2, C2, . . . are under- 
stood to be independent of n and to be strictly positive. The values of c and 
C may change at each appearance but constants ci, Ci, C2, C2, . . . are fixed. 
Throughout the paper, E„[-] denotes the average over index 1 < i < n, i.e., it 
simply abbreviates the notation X^ILiI']- -^-S-' ^^^ni^^fj] = Y17=i ^ij- 
Finally, for a function {Y{t) : t G T}, ||y(t)||j' denotes the supremum norm, 
i.e. \\Y{t)\\T := snpt^T \Y{t)\. 

2. Anti-concentration of suprema of Gaussian processes 

The main purpose of this section is to derive an upper bound on the 
Levy concentration function for suprema of separable stochastic processes, 
where the terminology is adapted from [2^. Let (fi. A, P) be the underlying 
probability space. 

Definition 2.1. Let Y = {Yt)t£T be a separable stochastic process indexed 
by a semimetric space T. For all x E M and e > 0, let 



Px,e(Y) ■■= P 



sup Yt — X 



<6 . (2.1) 
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Then the Levy concentration function of sup^gj- Yt is defined for aU e > as 

p,iY) := supp^^Y). (2.2) 

Likewise, define p^^^ed^l) by (j2.1l) with sup^gT^t replaced by sup^g^ \Yt\ and 
define Pe{\Y \) hy ^ with p^A"^) replaced by p^Al^l)- 

Let X = {Xt)t£T be a separable Gaussian process indexed by a semimetric 
space T such that E[Xf] = and E[X^^] = 1 for all t G T. Assume that 
suptgT Xt < oo a.s. Our aim here is to obtain a qualitative bound on the 
concentration function p^(X). In a trivial example that T is a singleton, 
i.e., X is a real standard normal random variable, it is immediate to see 
that PtiX) X e as e — ^ 0. A non-trivial case is that when T and X are 
indexed by n = 1,2, ... , i.e., T = Tn and X = X^ = {Xn^t)t<^T„, and the 
complexity of the set {Xn,t '■ t S Tn} (in L'^{^,A,P)) is increasing in n. In 
such it is typically not known whether sup^g-^^ Xn^t has a limiting 

distribution as n — )• oo and therefore it is not trivial at all whether, for any 
sequence — )• 0, Pe„(X"') — )• as n — )• oo, which is in fact generally not 
true as Example 1 in |:7] shows. 

Theorem 2.1. Let X = {Xt)t^T be a separable Gaussian process indexed 
by a semimetric space T such that E[Xf] = and E[X^] = 1 for all t G T. 
Assume that sup^g^Xt < oo a.s. Then, a{X) := E[supjg2-Xt] G [0,oo) and 

Pe{X) < Ae{a{X)V 1) (2.3) 

for all e >0 and some absolute constant A. 

The similar conclusion holds for the concentration function of sup^gji \Xt\. 

Corollary 2.1. Let X = {Xt)t£T be a separable Gaussian process indexed 
by a semimetric space T such that E[Xt] = and E[X^^] = 1 for all t E 
T. Assume that sup^gj^X^ < oo a.s. Then, a{\X\) := E[supjgj. G 
[y2/7r, oo) and 

pMX\) < Aea{\X\) (2.4) 
for all e >0 and some absolute constant A. 

We refer to (|2.3p and ()2.4p as anti-concentration inequalities because they 
show that suprema of separable Gaussian processes can not concentrate too 
fast. The proof of Theorem 12.11 and Corollarv 12.11 follows by extending the 
results in [t!] where we derived anti-concentration inequalities for maxima of 
Gaussian vectors. See Appendix for a detailed exposition. 

3. Coupling Inequalities 

The purpose of this section is to provide two new coupling inequalities 
that will be useful for the analysis of uniform confidence bands. The first 
inequality is concerned with suprema of empirical processes and is a di- 
rect corollary of Theorem 2.1 in [2,]. The second inequality is concerned 
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with suprema of Gaussian multiplier processes and will be obtained from a 
Gaussian comparison theorem derived in 

Let Xi,...,Xn be i.i.d. random variables taking values in a measurable 
space {S, S). Let ^ be a class of functions defined on S. We assume that G is 
a separable class of measurable functions uniformly bounded by a constant 
b such that the covering numbers of G satisfy 

sup N{g,L2{Q),bT) < {a/ry, < r < 1 
Q 

for some a > e and f > 1 where the supremum is taken over all measures 
Q on {S,S). We refer to function classes with these properties as VC type 
with constants a and v and constant envelope b. Let cr^ be a constant such 
that supgfzgE[g{Xi)'^] < < b"^. Define the empirical process 

1 " 



=1 

and let Wn ■= := sup^gg |G„((7)| denote the supremum of the empir- 

ical process. Let B = {B{g) : g £ G} he a centered tight Gaussian process 
with covariance function 

E[B{gi)B{g2)] = E[gi{XMXi)] - E[gi{Xi)]E[g2{X,)] 

for all gi, g2 ^ g ■ It is well known that such a process exists. Finally, for 
some sufficiently large but absolute constant A, denote 

Kn := Av{logn\/ log{ab/a)). 

The following theorem shows that Wn can be well approximated by the 
supremum of the corresponding Gaussian process B under mild conditions 
on 6, o", and Kn- 

Theorem 3.1. Consider the setting specified above. Then for any 7 G (0, 1) 

one can construct on an enriched probability space a random variable 
such that (i) = \\B\\g and (ii) 

m - ^ ^ ^ by^^l^K^\ ^ ( logn 

' " ' y/2ni/2 + y/2„l/4 + y/3„l/6 ^ 

for some absolute constant A 



Comment 3.1. The main advantage of the coupling provided in this theo- 



rem in comparison with, say, Hungarian coupling 18|], which can be used to 
derive a similar result, is that our coupling does not impose any side restric- 
tions. In particular, it does not require bounded support of X and allows 
for point masses on the support. In addition, if the density of X exists, our 
coupling does not assume that this density is bounded away from zero on 
the support. Finally, our coupling does not assume that functions g £ G 
have bounded variation. See, for example, [22] for the construction of the 
Hungarian coupling and the use of aforementioned conditions. 
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Let be i.i.d. A^(0, 1) random variables independent of := 

{Xi, Xn}- Denote := We assume that random variables 

Xi, Xn,S,i, ■■■i^n are defined as coordinate projections from the product 
probability space. Define the Gaussian multiplier process 

Gnig) := G„(Xr,ei^)(<7) := ^ -En[5(^.)]), 5 G ^ 

and for Xj^ G S'^, let := denote the supremum of this 

process calculated for fixed Xf = x^. In addition, let 

(b'^a^KlV'^ ^ lAVK^y/^ 1 

V'n:=V + -] and7„<5 :=T -] +-■ 

\ n \ n J \ n J n 

The following theorem shows that Wn{X^) can be well approximated by 
the supremum of the Gaussian process B under mild conditions on 6, o", and 

Kn. 

Theorem 3.2. Consider the setting specified above. Assume that b^Kn < 
na"^ . Then for any (5 > 0, there exists a set Snfl G 5"" such that P(5'„^o) ^ 
1 — 3/n and for any x" S Snfi one can construct on an enriched probability 

space a random variable such that (i) = ||-B||g and (ii) 

n\Wn{x^l) - > (V'n + <5)) < A^n{5) 

where A is an absolute constant. 

Theorems 13 ■ 1 1 and 13 . 2 1 combined with anti-concentration inequalities (The- 
orem [2T] and Corollary I2.ip can be used to prove validity of Gaussian mul- 
tiplier bootstrap for approximating distributions of suprema of empirical 
processes of VC type function classes without weak convergence arguments. 
This allows us to cover cases where complexity of the function class Q is 
increasing with n, which is typically the case in nonpar ametric problems in 
general and in confidence band construction in particular. Moreover, ap- 
proximation error can be shown to be polynomially (in n) small under mild 
conditions. In the next two sections, we will demonstrate how to use these 
theorems for honest adaptive confidence band construction. 

4. Analysis of confidence bands under high-level conditions 

We go back to the analysis of confidence bands. Recall that we consider 
the following setting. We observe i.i.d. random variables Xi, . . . ,X„ with 
common unknown density / £ on M'^, where J-" is a nonempty subset of 
densities on W^. We denote by P/ the probability distribution corresponding 
to the density /. We assume that the variables Xi are defined as coordinate 
projections from the product space. We will derive the theory under the 
following conditions. 
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4.1. Conditions. Let Af C M*^ be a set of interest. Let fn{',l) be a generic 
estimator of / with a smoothing parameter / G £„. Denote by anj{x,l) 
the standard deviation of y/nfn{x,l). We assume that anj{x,l) is strictly- 
positive on Vra := X £„ for all f £ T. Define the studentized process 
^n,f = {^nji'v) ■ V = {x,l) G Vn} by (|1.2p . To avoid a measurability 
problem, we assume that the process Z^j is separable. Denote Wnj ■= 
\\Znj\\v„- Let ci and Ci be some positive constants. 

Condition HI (Gaussian approximation). For each f £ T , one can con- 
struct on an enriched probability space a sequence of random variables ^ 

such that (i) ^ = ||Gn,/||v„ where Gnj = {Gnj{v) : v G Vn} is a 
centered tight Gaussian process with E[G„j(f)^] = 1 for all v £ Vn and 
m\Gnj\\vJ < CiVlogn and (ii) 

supP/(|H^„,/- VF„%| >ei„) <<5in, 

where ein and 5in are some sequences of positive numbers bounded from 
above by Cin~^^ . 

Let a G (0,1) be a fixed constant (confidence level). Recall that c„j(a) 
is the (1 — a)-quantile of the random variable ||G„j||v„- If Gnj is pivotal, 
i.e., independent of /, c„j(a) = c„(a) can be directly computed, at least 
numerically. Otherwise, we have to approximate or estimate Cnj{oi). Let 
Cn{ot) be a generic estimator or approximated value of c„j(a). The theory 
in the next section assumes that Cn{oi) is obtained via Gaussian multiplier 
bootstrap simulations. 

Condition H2 (Estimation error of c„(a)). For some sequences Tn, e2n, 
and 52n of positive numbers bounded from above by Gin~'^^ , we have 

(a) supP/ (c„(a) < c„j(q + t„) - e2n) < hn] 
(6) supP/ (c„(a) > c„j(a - r„) + e2n) < 

Let an{x,l) be a generic estimator of an, fix, I). Without loss of gen- 
erality, we may assume that anix,l) is nonnegative. Condition below 
states a high-level assumption on the estimation error of (T„(x, Z). Verifying 
Condition HSlis rather standard for specific examples. 

Condition H3 (Estimation error of (Tn(-))' ^^r some sequences and d^n 
of positive numbers bounded from above by Gin~'^^ , 



sup P / I sup 



We assume that the smoothing parameter In ■= ln{Xi, ...,X„), which is 
allowed to depend on the data, is chosen so that the bias can be controlled 
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sufficiently well. Specifically, for all I £ Cn, define 

A„j[l) := sup ^"Tn ' 

We assume that there exists a sequence of random numbers cj^, which are 
known or can be calculated via simulations, that control A„j(/„). The 
theory in the next section assumes that is chosen as a multiple of the 
estimated high quantile of || Gnj II v„- 

Condition H4 (Bound on A„j(/„)). For some sequence (54^ of positive 
numbers bounded from above by Cin~^^ , 



supP/ ( A„j(/„) > c'n] < 5in- 

For the purposes of the analysis of the length of confidence bands, we 
assume that in turn can be controlled by ti„-v/Iogn where Un is a sequence 
of positive numbers. Typically, Un is either a bounded or slowly growing 
sequence. 

Condition H5 (Bound on c^). For some sequences 5^n oind Un of positive 
numbers where 5^n is bounded from above by Cin^^^ and Un is bounded from 
below by ci, 

supP/ (c^ > Uny/logn) < (55„. 

If the function class where the true density / belongs to is known, it 
follows from the theory in the next section that one can find a constant 
C{J-) depending on T only such that Condition HSl can be satisfied by 
setting Un = C{J-). In applications, however, T is usually unknown. In 
these cases, one can assume that n„ is slowly growing, so that n„ > C{J-) 
in sufficiently large samples. 

Finally, we assume the following condition on the growth of (T„(-, /„). 

Condition H6 (Bound on (Jn{x,ln))- There exists a function t : — )• M 
and a sequence S^n of positive numbers bounded from above by Cin^^^ such 
that 

( - /logn\~'^/(^*(^)+'^^\ 

SUpP/ SUp(T„(x,;„)^ > Cl ( < 5Qn- 

f&r \x&x \ n J J 

The function t{f) measures the smoothness of the density /. For example, 
if is a suitable subset of a Holder ball, then t{f) is equal to the Holder 
order of the density /. 

4.2. Results. Define 

:= ein + e2n + e3nC„j(a), 
We now state the main results of this section. 
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Proposition 4.1. Assume that Conditions are satisfied. Consider 

the confidence band Cn = {Cn{x) : x € X} defined by 

where 

Sn{x,in):={cn{a)+c'J^^^^ (4.2) 
for all X G X. Then, we have for all n > 1 and f £ J^, 

Pfif G Cn) >{l-a)-5n-rn- AenjE[\\Gnj\\vJ 

for some absolute constant A. In particular, since E[||G„j||v,J < Ci^/logn, 
Pfif G Cn) > 1 — a — Cn^'^ for some constants c and C depending on ci 
and Ci only. 

Comment 4.1. (i) Proposition 14. II shows that the confidence bands in (j4.ip 
are asymptotically honest with level a for the class J-. Moreover, since 
the constants c and C in the statement of the proposition depend on ci 
and Ci only, the coverage probability can be smaller than 1 — a only by a 
polynomially small term Cn~^ uniformly over the class J-. 

(ii) An advantage of Proposition 14.11 is that it does not require Smirnov- 
Bickel-Rosenblatt (SBR) condition that is often difficult to obtain. In par- 
ticular, in the next section we will show that our proposition applies when 
fn{') is defined using either convolution or projection kernels under mild 
conditions, and, as far as projection kernels are concerned, covers estima- 
tors based on compactly supported wavelets, Battle-Lemarie wavelets of any 
order as well as other estimators such as those based on orthogonal polyno- 
mials and trigonomentric functions. SBR condition for compactly supported 
wavelets was obtained in [3], for Battle-Lemarie wavelets of degree upto 4 



m 



14l |. and for Battle-Lemarie wavelets of degree higher than 4 in jlOl]. To 
the best of our knowledge, SBR condition for orthogonal polynomials and 
trigonometric functions has not been obtained in the literature. In addition, 
SBR condition, being based on extreme value theory, yields only a logarith- 
mic (in n) rate of approximation of coverage probability. In contrast, our 
proposition gives polynomial rate. 

Proposition 4.2. Assume that Conditions are satisfied. Consider 

the confidence band Cn '■= {Cn{x) : x G X} defined in Then 



where 



sup P/ f sup A(C„(X)) > CiCn^^^ML) < 62n + hn + ^Gn (4.3) 

/eJ^ \x(^x V log ny 



Cn := c„j(a - r„) + e2„, + n„\/logn 

and 



Tnif) 



log^\*(/)/(2t{/)+rf) 



n 
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and where A denotes the Lebesgue measure on M. In particular, there exists 
a finite constant L depending on a, ci, and Ci only such that 

supP/ ( sup A(C„(x)) > Ln„r„ ) < Cn^'' (4.4) 

for some constants c and C depending on c\ and C\ only. 

Comment 4.2. Proposition 14.21 shows that the confidence bands in (j4.ip 
adapt to the smoothness t(/) of the density /. When is a suitable subset 
of the Holder ball (so that t(/) equals the Holder order of /) and is 
bounded, the rate of convergence of the length of confidence bands to zero 
u„r„ coincides with the minimax-optimal rate of estimation of / in the 
uniform metric. No additional inflating terms are required. 

5. Verifying Conditions H[T]-Hl6] 

In this section, we show that conditions HD-HSl used above hold for es- 
timators /(x, V) based on typical convolution and projection kernels under 
some commonly used assumptions. Recall that the estimators are based on 
an i.i.d. sample Xi, of observations with density / G on . Let 

.^1, ...,(^n be a sequence of A''(0, 1) random variables that are independent of 
:= {Xi, ...,X„}. Denote ^ := {^i, The set of random variables 

will be used to simulate critical values c(a) and c^. Throughout the rest 
of the paper, we assume that both and ^" are defined as coordinate 
projections from some product probabity space. 

Let {Ki}i^c^ be a family of kernels where i^T; : M'^ x M'^ — M and / is a 
smoothing parameter. We consider estimators of the form 

n 

fn{x,l) ■.= En[Ki{;x)] = - V^K^i,^) 

n ^-^ 

for all x G Af and / € The variance of y/nfn{x, I) is given by 

alj{x, I) := Ef[Ki{X^ - xf] - Ef[Ki{X^ - x)f. 
We assume that a'^ j{x, I) is estimated by 

a^ix, I) := - V(i^K^i -x)- /n(x, /))2 = -y2KiiXi- xf - A(x, if 

i=l i=l 

for all X € X and I £ Cn, which is a sample analogue estimator. 

5.1. Conditions. We will verify conditions HU-HS] from the previous section 
under the following assumptions. 

Condition LI. The function class Knj '■= {Ki{-, x)/anj{x,l) : {x,l) G 
X X Cn} is VC type with constants a > e and v > 1 and constant envelope 
bn for all f G T . In addition, Ef[g{Xif] < cr'^ < bf^ for all f G T and 
9 e ICnj- 
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Condition L2. There exist strictly positive constants 02 and C2 such that 
(i) for all f ^ F and I G £„, 

(Tnj{x,l) < SUpCJ„j(x,/) < C22''^/^ 

and (a) 02 < (t'^ < C2 ■ 

Comment 5.1. In this comment, we provide primitive assumptions that 
suffice for Conditions L[T] and L[5J 

(a) Convolution kernels. Consider a kernel function A' : M — )• M that 
(i) has compact support, (ii) is of bounded variation, and (iii) satisfies 
/ K{s)ds = 1. Let £„ C (0, 00). For x,y and I G £„, define 

Ki{y,x):=2''' W K (2'(y^ - x^)) . 

l<m<d 

Here 2^' can be enterpreted as a bandwidth parameter. Suppose that £.„ 
is contained in the interval [?min,ni ^max,n] where lmin,n — )• 00 as n — )• 00. In 
addition, suppose that uniformly over f £ T, 

f{x) > c for all X G and f{x) < C for ah x G M'^ (5.1) 

where Af^ is the c-enlargement of X. 

Then \\Ki{-, x)\\^d < C2^'^ for all x G M'^ and / G £„ because K has 
compact support and is of bounded variation. Further, there exists no such 
that uniformly over all / G x G Af, and / G £„, 

c2^'^ < (Jnj{x,lf < C2^'^ (5.2) 

for all n > riQ. This implies the first part of condition Ll2j Since (|5.ip gives 
\Ef[Ki{Xi,x)]\ < C uniformly over all / G J", Z G £„, and x G M'', (fOj) 
also implies the second part of condition Ll2j Since the product of VC type 
classes is VC type, it is also easy to check that condition L[T] holds for all 
n > no with some a and v independent of n, 6„ < (72'™'"''"'^/^. See j9( for a 
more general class of kernels that satisfy conditions lH] and LIS 

(ii) Projection kernels: compactly supported wavelets. Consider a father 
wavelet (p, i.e. a function (p such that — /c) : /c G Z} is an orthonormal 
system in L^(M), the spaces Vj = {Ylk^k4'{'^''^ ~ ^) • Ylk^l ^ ^'^i' 3 ~ 
0,1,2,..., are nested, and 'JjyoVj is dense in L^(M). Suppose that (j) is 
compactly supported, bounded, and of bounded p- variation for some p > 1. 
For example, all Daubechies' wavelets have these properties (see [ij], p. 
1613). Suppose also that there exists a constant c > such that for all 
X G M"', 

n (t>{Xrn-km)\ > C. (5.3) 

Let Cn C N. For x, y G and / G define 

Ki{y,x):=2'''Y, H 'A(2^m - A;^) J] <^(2'^m " ^m) (5.4) 

yfcgZ'* l<m<Q! l<m<Q! 
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Since (/> is compactly supported and bounded, |2 '''^Ki{y,x)\ < C uniformly 
over all x,y G and I ^ Cn- Therefore, it follows from Lemma 2 in 12] 



that the function class /C^ := {2-^'^Ki{-,x) : I £ N,x £ W^} is VC type with 
constant envelope when d = 1. When d > 2, IC'^ can be represented as a 
product of classes jC^ corresponding to different coordinates of W^, and so it 
is also VC type with constant envelope. 

Suppose that jC-n is contained in the interval [lmin,n, ^max,n] where Zmin,n —5" 
oo as n — )• oo. In addition, suppose that uniformly over all f £ ()5.ip 
holds. Since (j) has a compact support, Ki{y,x) = if |2'y — 2'x| > c for 
some constant c > 0. Therefore, |Ej-[ii'/(Xj, a;)]| < C because 2~'-'^\Ki{Xi,x)\ 
is bounded. Similarly, Ej[ii';(Xj, x)^] < 2''^C Further, there exists no such 
that for all n > no uniformly over all f G J-, x £ X, and / G Cn, 

Ef[Ki{Xi,xf]= [ Ki{y,xff{y)dy=f Ki{y , xf f {y)dy 

JM.d Jy:\y-x\<2-lc 

>c Ki{y,xfdy = c Ki{y,xfdy 

Jy:\y-x\<2-^c JR'* 

= 2''^c I n -km)\ > 2"'c 

keZ'i \l<m<d J 

where the last inequality follows from assumption (j5.3p . Therefore, for all 

n > no, 

2"c < an,f{x,lf < 2^'^C, 

which implies condition Ll2] as in the case of convolution kernels. Since 
the product of VC type classes is VC type, we conclude that condition L[T] 
also holds for all n > no with some constants a and v independent of n, 
b„, = C'2'"^="''"'^/2_ 

(iii) Projection kernels: Battle-Lemarie wavelets. Consider a Battle- 
Lemarie farther wavelet (j) of order r > 1. Suppose that (15. 3p holds for 



some c> 0. Let Cn C N. For x,y £ R'^ and I G £„, define Ki{y, x) by ([53]). 
It follows from Lemma 1 in [15|] that \2~'''^Ki{y,x)\ < C uniformly over all 
x,y £ and / £ Cn- Therefore, it follows from Lemma 2 in that the 
function class K.'^ := {2~'-'^Ki{-,x) : I £N,x £ W^} is VC type with constant 
envelope when d = 1. When d > 2, K,'^ can be represented as a product of 
classes K} corresponding to different coordinates of W^, and so it is also VC 
type with constant envelope. 

Suppose that Cn is contained in the interval [/min,rn ^max,n] where lram,n 
oo as n — )■ oo. In addition, suppose that uniformly over all f £ (15. ip 
holds. Since (/> is a Battle-Lemarie wavelet, 2~^'^\Ki{y , x)\ < exp(— c|2'y — 
2^x\) for some constant c > 0. Therefore, |Ej[i^;(Xj, < C because 
2-^'^\Ki{Xi,x)\ is bounded. Similarly, Ef[Ki{Xi,xf] < 2^'^C. Further, there 
exists no such that for all n > uq uniformly over all f £ x £ X, and 
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Ef[Ki{X„xf]= [ Ki{y,xff{y)dy> [ 




Ki{y,xff{y)dy 



>cf Ki{y,xfdy = cj Ki{y,xfdy-C 




= 2'''c W <P{2'xm -km)\ -C> 



fceZd \\<m<d 





where the last inequaUty fohows from assumption ()5.3p and the fact that all 
/ G Ln are sufficiently large since n > no and /min,n oo. Therefore, for all 

n > TT-o, 



which implies condition 1(2] as in the case of convolution kernels. Since 
the product of VC type classes is VC type, we conclude that condition [1] 
also holds for all n > hq with some constants a and v independent of n, 
= C2'-->"'^/2, and c<al<C. 

(iv) Projection kernels: other bases. Suppose that X equals the whole 
support of / for all / G J^H Let {ipj : j = 1, ...,oo} be an orthnormal basis 
of L2(Af), the space of square integrable functions on X. Let C (0, oo) 
be such that 2'"^ G N for all / G £„. For x,y G X and / G £„, define 



Here, 2 equals the number of series (basis) terms used in the estimation. 
Suppose that Cn is contained in the interval [/min,n; ^max,™] where /min,n — ^ oo 
as n — )• oo. For all x E X, let ^i{x) be a 2'"^ x 2'*^ matrix with {j,k)-th. 
component equal to (pj{x)ipk{x). Suppose that all eigenvalues of Ej[<I>/(X)] 
are bounded from above by C and from below by c uniformly over all / G Cn 
and f £ J-'. In addition, assume that |Ej[ir;(Xi, x)]| < C uniformly over all 

/ G £n and / G J" and c2^'^ < Y.%i 'Pjixf < C2^'^ uniformly over ah / G 



Ki{y,x) :=Y<fj{y)ipjix). 
i=i 



Then 



Ef[{Ki{Xux))^]<CY,^,ixf<C2' 



,ld 




The case when X is a proper subset of the support of / can be handled similarly but 
requires a more technically involved argument. 
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and so the first part of Condition 1(2] holds. Further, for all x,y G X and 



Kiiy,x) < 



and so the second part of Condition L[2] follows as well. 

Further, assume that X is compact, and that there exists L„. satisfying 
logn < Clogn such that Y?j=i{^j{x) - ^j{y)f < L\Yfj=i{xj - VjY ■ Then 
Condition L[T] holds with constants o = CL„2'™'''='"'^/^ and some v indepen- 
dent of n, bn = and c<a'^<C. 



Following [ij] , we will impose the following condition restricting the func- 
tion class 

Condition L3. There exist no G N and strictly positive constants c^, C3, C4, 
and C4 such that for all n > uq, f G T and I G Cn, there is some t G [03, C3] 
such that 

C42-'* < sup \Ef[fn{x, I)] - f{x)\ < (5.5) 

Comment 5.2. Let r E N. Assume that d = 1. In addition, assume that the 
estimator /n(-) is constructed either from convolution or wavelet projection 
kernel as described in comment 1 5.11 Further, for convolution kernels, assume 
that J K{s)s^ds = for / = — 1 and J K{s)\s\'^ds is finite. For 

compactly supported wavelet projection kernels, assume that either (p is 
(r — l)-regular or corresponding mother wavelet satisfies / tl^{s)s''ds = 
for every < / < r — 1. For Battle-Lemarie wavelet projection kernels, 
assume that wavelet of order r is used. Then it is well-known that the 
upper bound in Condition [3] holds if / G C* for any t < r where is the 



Holder- Zygmund space. As far as the lower bound is concerned, 1J| showed 
that it holds for " generically" in Holder spaces with smoothness t < r. See 
the original paper for more detailed explanation of this result. 

5.2. Results. Let Kn := Au(logn Vlog(a6„)) for some sufficiently large but 
absolute constant A. Let Gnj ■= {Gnj{v) : v G Vn} be a zero mean tight 
Gaussian process with the same covariance structure as that of Znj- It is 
well known that under Condition L[T]such a process exists. 
For a\l X £ X and I £ Cn, define Gaussian multiplier process: 

Gn{x,l) := G„(Xr,er)(x,0 :=^E^. ^'^^^:? - fn{x,l) _ 

We assume that the critical value c„(a) is simulated as conditional (1 — 
o)-quantile of \\Gn ||v„ given X]". Let C5 and C5 be some strictly positive 
constants. The following theorem verifies Conditions HT]-I^ 

Proposition 5.1. Assume that conditions 421 <ind 4S hold. Assume that 
bf^K^/n < Csn"'^^. Then conditions Ijl^-HM hold with constants ci and C\ 
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depending on C2, C2, C5, and C5 only and with the process Gnj and the 
critical value Cn(a) defined above. Moreover, condition I{M holds uniformly 
over all a G (0, 1). 

Next, we verify conditions HH-HSl assuming that the smoothing parameter 
In is chosen according to a version of the Lepski's method. Specificahy, let 7„ 
be a sequence of strictly positive numbers converging to zero. Let Cn,/(7n) 
be the (1 — 7„)-quantile of the random variable HCnjUvn) and let c„(7n) 
be an estimator of c„j. We assume that Cni'jn) satisfies condition H21with 
a replaced by 7^, which holds under conditions of Proposition 15. 1[ For all 
/ G Cn, let Cn,i '■= {l' € Cn ■ I' > I}- For some constant q > 1, which is 
independent of n, define a " Lepski-type" estimator 

1 fiir-r V^lfnixJ) - fn{x,l')\ \ 

In := mf <^ / G £„ : sup sup — — — . , — rp— < qCn{jn) } (5.6) 

[ l'eC„,lx<^X crn{x,l) +(Tn{x,l') J 

Previous literature on the Lepski's estimator used Talagrand's inequality 
combined with some bounds on expectations of suprema of certain empiri- 
cal processes (obtained via either entropy methods or Rademacher averages) 
to choose the threshold level for the estimator (the right hand side of the 



inequality in (|5.6p ); see [l3| and [l5|]. Because of the one-sided nature of 



the aforementioned inequalities, however, it was argued that the resulting 
threshold turned out to be too high leading to limited applicability of the 
estimator in small and moderate samples. In contrast, an advantage of our 
construction is that we use qcni'j threshold level, which is essentially 

the minimal possible value of the threshold that sufRcies for good properties 
of the estimator. The analysis of theoretical consequences of our construc- 
tion beyond the fact that it is sufficient for our results is out of the scope of 
this paper. 

Let ii„ be a sequence of positive numbers such that Un is sufficiently large 
for large n. See the theorem below for the exact requirements on We 
set c'n := UnCn^cin)- The following theorem verifies Condition Hll-HSl 

Proposition 5.2. Assume that conditions 421 o.f^d iWi^old. In addition, 

assume that bnK^/n < C^n~'^^ , 7 < C^n~^-'' , |log7| < Cslogn, and u > 
cslogn. Finally, assume that Cn is closed and that there exists s > such 
that for any I G £„, there either exists I' G Cn satisfying V G (Z — s, /) or there 
is no V G Cn such that I' < I. Then conditions l^HE hold with In and c'n 
defined above and constants ci and Ci depending on {{cj,Cj) : 2 < j < 5} 
only. 

Appendix A. Technical Tools 

Theorem A.l. Let X andY be zero-mean Gaussian p-vectors with covari- 
ances S"^ and correspondingly. Then for any g € C"^' 



E 



q ( max X : ] — E o ( max Y, 

\i<j<p VJ L v<j<p 



< ||/|UA/2 + 2||5'||ooV2ATq 
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yY I 
^jk\- 



where A = maxi<j^fc<p |S 
Proof. See Theorem 1 in |7|. □ 

Theorem A. 2. Let ^ and v he Borel probability measures on M. Let e > 
and 5 > 0. Suppose that fJ.{A) < i>{A^) + e for every Borel subset A o/R. 
Let V be a random variable with distribution fj,. Then there is a random 
variable W with distribution v such that P{\V — W\ > 5) < e. 

Proof. See Lemma 4.1 in [6j. □ 

Theorem A. 3. Let /3 > and 5 > For every Borel subset B o/ M, 

there is a smooth function 5 : M — )• M and absolute constant A > such that 



< 6 



-1 



\a"\\ 

\y Woo 



< A^S-^, and for alltGR 
{1 - e)lBit) < g{t) < e + {1 - e)lB3s{t) 
where e = ep^^ is given by 

£ = ^/e-^{l + a) < 1, a = I3'^5'^ - 1. 
Proof. See Lemma 4.2 in [6]. 



□ 



Theorem A. 4. Let be i.i.d. random variables taking values in 

a measurable space {S,S). Suppose that Q is a nonempty, pointwise mea- 
surable class of functions on S uniformly bounded by a constant b such that 
there exist constants a > e and v > 1 with supq N{Q, L2{Q),be) < {a/e)"" 
for all < e < 1. Let cr^ be a constant such that sup^ggvar(g) < cr^ < 6^. 
If h'^v\og{ab/a) < na"^, then for all t < na'^/b'^, 



sup 

96G 



n 

E 

1=1 



{5te)-Eb(6)]} 



> Ai 



t V 



ab 

V log — 
a 



< e" 



where A > is an absolute constant. 



Proof. This version of Talagrand's inequahty follows from Theorem 3 in 21| 
combined with a bound on expected values of suprema of empirical processes 
derived in 0]. □ 

Theorem A. 5. Let Y := {Y{t) : t £ T} be a zero mean separable Gauss- 
ian process such that E[y(t)^] = 1 for all t £ T. Let c{a) denote the 
(1 — a)-quantile of \\Y\\t- Assume that E[||y||'r] < 00. Then c{a) < 
E[||y||T] + V2|loga| and c{a) < M{\\Y\\t) + v^2| loga| for all a E (0, 1) 
where M(\\Y\\t) is the median of ||y||T- 

Proof Pick any a e (0,1). Since E[Y{t)'^] = 1 for all t G T, Borell's 
inequality (see Theorem A. 2.1 in [27]) gives for all r > 0, 

P{||y||T>E[||y||r]+r}<e-^'/2^ 



Setting r 



y/2\ log a\ gives 

P {||y||T > E[||y||T] + \/2|loga|} < a. 
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This implies that c(a) < E[||y||r] + \oga\. The result with M{\\Y\\t) 
follows similarly because Borell's inequality also applies with M(||y||T) re- 
placing E[||y||T]. □ 

Appendix B. Proofs for Section [2] 

Proof of Theorem \2.1\ The fact that a{X) < oo follows from Landau-Shepp- 
Fernique theorem (see, for example, Lemma 2.2.5 in [§]). Since sup^g^Xf > 
Xt„ for any fixed to e T, a{X) > E[XtJ = 0. We now prove (fOD . 

Since the Gaussian process X = {Xt)t^T is separable, there exists a se- 
quence of finite subsets CT such that := maxtg-p^ Xt — )■ supjg^" ='■ 
Z a.s. as n — 7- oo. Fix any a; G M. Since \Zn — x\ — t- \Z — x\ a.s. and a.s. 
convergence implies weak convergence, there exists an at most countable 
subset Mx of M such that for all e G M\A/'x , 



lim P{\Zn - x| < e) = P{\Z - x\ < e). 

n— >oo 

But by Theorem 3 in 0], 

Pi\Zn - x| < e) < 4e(E[maxXt] + 1) < 4e(a(X) + 1) 

for all x G M and e > 0. Therefore, 

P(|Z-x| <e) <^e(a(X) VI) (B.l) 

for all X G M and e G M\J\fx- By right continuity of P(|Z — x| < •), it follows 
that (]B.ip holds for all e > 0, and so (|2.3p holds as well. This completes the 
proof of the theorem. □ 

Proof of Corollary \2.1\ The proof is analogous to that of Theorem 12.11 and 
therefore is omited. □ 

Appendix C. Proofs for Section O 

Proof of Theorem Vj.l[ The proof consists of applying Theorem 2.1 in fol]. 
Standard calculations show that for any e G (0, 1), 

J{e) := r sup + log N{g,L2{Q),bT)dT < Ce^/\og{a/ey . 
Jo Q 

For some sufficiently large but absolute constant C, let k := C{b^Kn/n + 
6cr2)V3. Lemma 2.2 in 01 implies that > E[supggg ^"^^ |c/(Xi)|3]. 
Let e = (T/(6nV2). Then i/„(e) := log{supQ N{g, L2{Q), be) V n) < K„ 
and J{e) < CaKn"^ / {bri^/'^). Note that selecting C in the definition of k 

sufficiently large yields b/ k < ^^^^^n^l'^Hn ■ Therefore, with this choice 
of parameters, the claim of the theorem follows by applying Theorem 2.1 in 
0] (with some intermediate calculations taken from Lemma 2.2 in [6]) using 
the facts that Kn > 1, > cr, and 7 < 1. □ 
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Proof of Theorem E3 Define G ■ G = {g ■ g : g, g e Q} and [G - g f = 
{{9 ~ 9)^ '■ 9:9 ^ G}- It is easy to see that ^ • ^ is VC type with constants 
2a and 2v and constant envelope 6^ and {Q — Q)"^ is VC type with constants 
4a and Av and constant envelope 46^. In addition, E[g(^] < 6^0"^ for all 
g eQ - Q and Y^lg"^] < IQb'^a'^ for all g £ {Q - Qf . Together with condition 
b'^Kn < no"^, which is assumed, this justifies an application of Talagrand's 
inequality (Theorem IA.4P with t = logn, which gives 

P fsup \W.n[g{X,)] - E[g{X,)]\ < \[?^] > 1 - ^, (C.l) 




^sup^ |E„[5(X,)] - E[giX,)]\ < \l 1 > 1 " T' (C-2) 

sup \&n[9{X,)\ - Eb(Xi)]| < > 1 - r- (C-3) 



n 



Let 5„_o C 5" be the intersection of events in ()C.ip - (|C.3p . Then P(S'n,o) ^ 
1 - 3/n. 

Fix any x" E Snfl- Let r = a/{bn^/'^), and let {gi, ...,g]\f} C ^ be a subset 
of elements of Q such that for any g G G there exists j = j{g) G {1, X} 
such that E[{g{Xi) — gj{Xi))'^] < IP't'^. We may and wih assume that N < 
(a/r)". Define 

W{x-^){t) := max |G„(x^,er)(5,)l, 

1<3<N 

W^(t) := max \B(gA\. 

In addition, define := \\B\\g and 

G{t) ■.= {g-~g:g,g^G, E[{g{X,) - g{X,)f] < b^r^} . 

Clearly,wehave|VF'„(2;^)-H^(x^)(r)| < \\Gn{x'^,Ci )\\g{T) and | VF°-VF°(t)| < 
The rest of the proof consists of 3 steps. Steps 1 and 2 pro- 
vide bounds on ||Gn(3;", and respectively. Step 3 gives 
a coupling inequality and finishes the proof using a method for comparing 
Ty(xy)(r) and W°{t). 

Step 1 (Bound on ||G„(x", Here we show that with probability at 
least 1 — 2/n, 

iiGn(x?,cr)ibw < + \-^) = 

Note that 

sup \¥.n[g{xif]-^„[g{x.{)f\< sup Ka[9{xi?]--=D{T). 
a&g{r) g&g{r) 
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Then D{t) < pi +P2 < cr^/n + \/b'^a'^Kn/n because 



pi := sup Eb(Xi)2] <6V = aVn, 

9ee(r) 

P2 := sup |E„[5(xi)^] -E[5(Xi)2]| < sup \En[g{x^)] - E[g{Xi 



n. 



By Borell's inequality (see Theorem A. 2.1 in [271]), with probabihty at least 
1 - 2/n, 



iiG„w,er)iig(r) < E [iiG„(x?,er)iigHj + Vmr)^. 

Further, E[||G„(:e^ erOIIgM] < C{n + r2) where 

1 " 



ri 
r2 



E 



sup 

96g(r) 



sup |E„[g(xi)]|. 

9Ge(r) 



To bound ri, let = a/{bn'^/'^) + (cj^i^^/Cfe^n))!/^. Note that ^/D(T)/b < tp 
and (/? < 1 + (Kn/n)^/^ < 2 < a. So, by Corollary 2.2.8 in (13], 







n < Cb I JsuplogN{g,L2{Q),be)de < Cbipy/log{a/ip)^ 

'b^a^Kn 



To bound r2, we have 



+ 



n 



n 



T2 < 2 sup \En[g{xi)] - E[g{Xi)]\ + sup \E[g{Xi)]\ < Sy^a^Kn/n. 

Combining these inequalities and increasing constant A in the definition of 
Kn gives the claim of step 1. 

Step 2 (Bound on ||i?||g(r)y)- We show that with probability at least 1 — 2/n, 

\\B\ 



\g{r) 



< 



n 



By Borell's inequality, with probability at least 1 — 2/n, 



<E[p||g(,)]+6rv/21ogn. 

By Corollary 2.2.8 in [27], 

E[]]-B]]g(,)] <Cb [ /suplogN{g,L2iQ),be)de < Cbr ^log a/ t. 
Jo \ Q 



'0 y Q 

Substituting r = a/{bn^^'^) into these inequalities gives the claim of step 2. 
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Step 3 (Coupling Inequality). This is the main step of the proof. Let 5 > 
and j3 = 2yJ\ogn/ 8. Then 



Take any Borel subset i? of R and apply Theorem IA.3I to define a function 
/ corresponding to the set i?'^", ■0„-enlargement of the set B, with chosen 
j3 and 5. We have for all t G M, 



(l-e)lBV'n(t) </(t) <e + (l-e)lB^n+3^(t). 



Further, 



where 



A := sup |Ag,,3,| < C\l — - 



^91,92 := {'^n[gi{xi)g2{xi)] -'En[gi{xi)]En[g2{xi)]) 

- (E[5i(Xi)52(Xi)] - E[5i(Xi)]E[52(Xi)]) . 

So, applying Theorem lA.ll to W{xi){t) and W^(t) with chosen / gives 



E[/(W^K)(r))] - E[/(H^'^(t))] < -W + - 

' ' V n \ n 

We will assume that b'^a'^K'^/{n5'^) < 1 (otherwise, the bound claimed in 
the statement of the theorem is trivial). Then 

/l2 2;^3\ 1/4 

\E[f{W{x^){T))]-E[f{W\T))]\ < - [-^) < CiniS). 
Therefore, 

E[lB(W^nW))] < E[lB^„(VF(x^)(r))]+2/n 

< E[f{W{xrKrW{l-e)+2/n 

< E[/(TyO(T))]/(l-£) + C7n(5) 

< E[l5^„+3.(Ty°(T))]+C7n(5) 

< E[1b2^„+m(W°)] + C77„((5), 

where C is varying from line to line. The claim of the theorem follows by 
applying Theorem IA.2I □ 

Appendix D. Proofs for Section [4] 



Proof of Proposition \4-l\ Pick any f ^ T . By the triangle inequality, we 
have for any x G Af, 

V^lfnjxJn) - fix)\ p an, fix, in) , ^ 
— J— < \^n,f[X, Ln)\— J— + ^^n,f[ln), 
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by which we have 

Pf{f{x)eCn{x)yx€X} 

>Pf\ \Zn,f{x, Dl^^^"^^^ + ^njiL) < c„(a) + clyxex\. (D.l) 

[ &n{x,ln) J 

By Condition 1^1 P/{A„j(Z„) > c^} < 5in, and so the probabihty in ID. II is 
bounded from below by 

P/ sup T- — < C„(a), Vx £ X ] - (54n, 

\x(^X an{x,ln) ) 

which is in turn bounded from below by 

/ \Uv)-'EAfniv)\\ ^ - i , 

Pf sup < Cn[a),\/x € X \ - din 

>l-a-Tn-Sn- AenjE[\\Gnj\\Vn] 

where the second line follows from Lemma ID. II Combining inequalities 
above completes the proof of the first claim. 

To prove the second claim, note that 5„ < Cn~^ and r„ < Cn~'^ by con- 
ditions Further, by Markov's inequality, c„j(a) < E[||G„j||v^]/a < 
C\/\og n, and so e„j < Cn~^. Therefore, the second claim follows from the 
first claim. □ 



Proof of Proposition \4-^ By construction, 
sup A(C„(x)) = 2(c„(a) + 

Therefore, ()4.3p follows from Conditions and HSl 

We now prove (|4.4p . Since t„ and e2n are both bounded by Cin"'^^ (con- 
dition H2]), there exists no such that < a/2 and e2n ^ 1 for n > uq. For 
n < riQ, (j4.4p holds by choosing sufficiently large C. Consider n > tlq. Then 

Cnjia - r„) + e2n < Cnj{a/2) + 1 

By Theorem [All c„,/(a/2) < E[||G„j||vJ + x/2| log(a/2)| . By Condition 
HH E[||G„j||v„] > 1, and so c„j(q;/2) + 1 < CE[||G„j||v„] for some constant 
C depending on a only. Further, combining E[||G„j||v„] > 1 and u„ > ci 
(Condition gives 



n 



where L depends on a, ci and Ci only. Substituting this expression into 
(j4.3p yields (j4.4l) . which concludes the proof of the proposition. □ 
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\Uv)-'^f[Uv) 



23 



V f sup 



> Cn{a) 



<a + Tn + 6in + S2n + 53n + M^ln + e2n + e3„C,ij(a))E[||G„ J ||v„] 

/or some absolute constant A. 

Proof of Lemma \D.1[ Recall that Wnj = \\Znj\\vn- Using Conditions H2l 
and ^31 shows that probability in the statement of the lemma is bounded 
from above by 



Ff sup 



\fn{v) -EfiUv) 



< cJa) 



> Pf {WnJ < (1 - e3n)Cnj{a + T„) - e2„) - 52n - S^n- (D.2) 

Using the Gaussian approximation assumed in Condition we now have 
(I-D-2P = P/ < (1 - e3n)c„,/(a + r^) - ei„ - e2n} - (^in - '52n - Ssn- 

Recalling the definition of the concentration function, the probability in the 
expression above is bounded from below by 

P/«,/ < Cn,f{a + Tn)-enj} > ?/ «,/ < c„j(a + r„)} -p,^J\Gn,f\) 

> 1- a-Tn-pi„f{\Gnj\). 

Applying Corollary [27T] to bound pi^ gives the asserted claim. □ 

Appendix E. Proofs for Section [5] 

Proof of Proposition \5.1[ There exists hq such that C2 > C^n~'^^ for all 
n > riQ. For n < hq, set 5in = = '^3n = 1- Then Conditions HBHSl 
hold for these n's by choosing sufficiently large Ci and sufficiently small ci. 
Consider n > no- Condition HT] follows from Theorem 13.11 and Corollary 
2.2.8 in 



271 ]. Consider condition Note that 



(T„(X,/) 



< 



(E.l) 



Define /C^ ^ := {g : g G K^nj}- Given the definition of /), the RHS of 
(jE.ip is bounded by 



sup \En[g{Xi 



^ n 



E[g{X^)]\+ sup \En[g{Xi)]^ 



E[5(Xi)]2|. (E.2) 



It is easy to check that the function class /C^ ^ is VC type with constants 
2 A and V and constant envelope 6^. Moreover, for all g G /C^ j, 

E[9{X^f] < blE[giX,)] < blal 
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Therefore, Talagrand's inequality (Theorem IA.4|) with t = logn, which can 
be apphed because b'^Kn/n < b'^KI^/n < C^n~'^^^ < C2 < o"^, gives 



P ( sup |E„b(X,)] - Eb(Xi)]| > < i. (E.3) 

^^^Ij 2 V n y n 

In addition, 

sup \^n[g{Xi)f - Eb(Xi)]2| < 2hn SUp \^n[g{Xi)] - Eb(Xi)]| , 

and so another apphcation of Talagrand's inequality yields 



P( sup |E„b(X0]2-Eb(Xi)]2| >ij^^^] <i. (E.4) 



n I n 



Given that bl^alKn/n < Cn~'^ for some c, C > 0, combining (|E.1|) - (|E.4|1 
gives condition HSwith esn := {b'^a'^Kn/nY/'^ and S^n '■= 2/n. 
Finally, we verify condition Define 

and 

AG„(x, /) = Gn{x, I) - Gn{x, I). 
In addition, for all x" G R"'^, define 

:= sup G„(x^Ci")(x,/), 

:= sup G„(x^Cr)(^,0- 

Consider the set Sn,i C M"'^ of values X" such that whenever X" G Sn,i, 
\an{x,l)/anj{x,l) — 1| < esn for all {x,l) £ X x £„. Calculations above 
show that F{Sn,i) > 1 — (^sn = 1 — 2/n. Pick any x" G 5'„^i. Then 

is a zero-mean Gaussian process with 

al{x,l) / an{x,l) -,^^/^^2 
0-2 (x,0 Vo-n(a;,0 

whenever e3„ < 1/2. From now on, we will assume that n is sufficiently 
large so that e^n < 1/2- Further, let /Cn,/ := {off : a G (0, l],g G /Cn,/}- It is 
easy to check that /C„j is VC type with constants 2A and 2V and constant 
envelope 6„. It is also easy to check that uniform covering numbers of 
the process AGri(x",^") with respect to the natural (standard deviation) 



var(AG„(x^^]^)(x,/)) = - 1 < O^L 
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semimetric are bounded by uniform covering numbers of the function class 



1/2 

ICnj- So, applying Theorem A. 2. 7 in [271] gives for all A > Kn £3. 



P(|#„K)-Ty„(x'^)| > a) < p( sup |AG„K,^n(^>OI > A) 

< exp(K„ - AVlSeL)- 

Recall that e^^ = bf^a'^Kn/n. Since K„ > Clogn and b'^a'^K^/n < Cn^'^, 
it follows that there exists A such that 

A < Cn""^ and P(|W'„(x?) - Wn{x'{)\ > A) < Cn''' (E.5) 

uniformly over G M"'^. By Theorem [32] and condition blK^/n < Cin-''\ 
there exists a measurable set S'„,2 C M"'^ such that P(5n,2) > 1 — 3/n 
and for any x" G 5„.o one can construct a random variable VF*^ such that 

= IIG^jllvn and there exists (possibly different) A such that 

A < Cn-^ and F{\Wn{xD - W°\ > A) < Cn-" (E.6) 

uniformly over x" G 5'„^2- Combining ([E.SP and ([E.6P shows that uniformly 
over G 5„,o := -S*™,: n 5^,2, 

P(|W^nW) - > A) < Cn-" 

for some A < Cn~'^. 

Let c„(a,a;") be conditional (1 — a)-quantile of ||G'„(x", )||v„- Then 
c„(a) = c„(a,Xf) and for any x'l G iSn^Oj we have 

P{||G„j||v„ <c„(a,x?) + A} 

= P{W^° <c„(a,x5^) + A} 

> P[{iyO < c„(a, x-^) + A} n {\Wn{x^) - W^\ < A}] 
>P{#„(x^) <c„(a,x?)}-Cn-^ 

> 1 - a - Cn-^ 

by which we have c„(a) > Cnj{a + Cn~^) — A whenever G iS^.o, which 
happens with probability at least 1 — 5/n. This completes the proof of part 
(a) of condition Part (b) of condition follows similarly. □ 

Proof of Proposition \5.S\ . In the proof of this proposition, we will assume 
that Conditions HI]- H3] hold. Moreover, we will assume that Condition H21 
holds uniformly over all a G (0, 1). Indeed, these conditions are verified in 
Proposition 15.11 under weaker assumptions than those imposed here. 

Fix / G By condition L[3l there exists t G [03 , C3] such that equation 
(|5.5p holds with these / and t. In fact, it is easy to see that t is defined 
uniquely. This defines the function f : — t- M appearing in Condition HH 

By Condition Hll e^n is bounded by Cin^^^. So, there exists no such that 
esn < 1/2 for all n > no. Let S^n = = 1 for n < no, so that Conditions 
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HUand HUhold these n's with Ci sufficiently large and ci sufficiently small. 
Consider n > riQ. 

Let m > s be such that C42("'-*)'=3 > (7^^ Mi > be such that Mi(l - 
C4/(c42(™-^)^3) > 2{q + 1)(74/C4, and M2 > be such that M2 < (4/9)(g - 
I)(c2/C2). For M > 0, define 

r(M) := inf I / G £„ : (742"'*^^ < Mc(7„) sup /) I , 

and let := l*{Mi) and /2 := 1*{M2). We will invoke the following lemmas. 
Lemma E.l. Ai := P/(/n < ^1 ~ ^) ^ Cn^^ uniformly over f ^ T . 

Proof of Lemma \E.1[ Define := {/ G Cn '■ I < li — m}. If there is no 
/' G Cn such that I' < we are done. Otherwise, since G £,1 by the fact 
that Cn is closed, there exists some /' G £„ such that /' G (/^ — 5,/^). Fix 
this I'. Then 

o / w 13 ■ f Vn\fnix,l) - fn{x,l')\ ... A 

P,(^. P, (^^mf ^ sup ,^(,^,)^,^(,^,) < '^CnlTn) j 

~ ye£™ sup^g;t'('5"n(2;,0 + '5"n(2;,/')) ~ ^ n J- 
By triangle inequality, 

sup \fn{x, I) - fn{x, > 042"'* - ^2-^'* 

- sup - E/[/„(x,0]| - sup |/„(x,0 - Ef[fn{x,l')]\. 

Further, for / G by the definition of construction of Mi, and since 
t > C3, 

C42~'* - C42-''* > ^ /^^ C4 



2 ~ 2'*+i V C42(™-^)^3 

> „ J^'^ ^ Mc(7,t) supo-„(x,/) (1 



and 



> (g + l)c„(7n) sup 0-n(x, /)/\/n 
ISA' 



C42"'* - C42-''* ^ C4 / C42('"-^)^3 ^ 



2 - 2''*+i I a 



4 



1 /'c42'^"^~'^)'^3 

> ;7^^c(7„) sup (T„(x, /') 

2vra xgA- Y C4 

> (g + l)c„(7„) sup(T„(x,/')/\/"^- 
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Combining these inequalities yields 
FfiL <lt-m) 

< P/ sup sup ^/n\fn{x, I) -Ef[fn{x,l)]\ - Cn{'Jn)sU-pan{x, I) >0 

+ P/ ( sup V^|/„(x,Z') - E/[/„(x,/')]| - c„(7„) sup(T„(j;,/') > 

< 2P, f sup > ,^(^^)^ < + Cn- < Cn^ 

where the inequality preceeding the last one follows from Lemma ID.ll be- 
cause Cnj{jn) ^ C-v/log n, which in turn follows from Theorem lA. 51 and the 
facts that |log7„| < C \/log n and E[||G„„/||v,J < C\/logn. This gives the 
asserted claim. □ 

Lemma E.2. A2 := P/(^n > ^2) — Cn~'^ uniformly over f ^ T . 

Proof of Lemma \E.Si Note that with probability at least 1 — S^n, for all 

sup an{x, /)<(! + esn) sup cJ„j(x, /)<(! + e3„)C22''^/2_ 

xex xex 

Therefore, with the same probability, 

C42-'2*^ < M2CM{1 + e3n)C22'5'^/2, 
and so for all I > I2, 

C42-"V^ < M2C„(7„)(1 + e3„)C22''^/2 

< M2C„(7„)(1 + e3„)(C2/c2) inf fT„ /(x, /) 

x£X 

< M2C„(7„)(1 + e3„)^(C2/c2) inf <7„(x,0 

x£X 

<{q- l)c„(7„) inf <t„(x,/) 

x£X 

where the last inequality follows from the choice of M2 and the fact that 
(1 + esn? < 9/4 for n > uq. So, 



Pf sup A„j(/) >{q- l)Cn(7n) < Ssn- (E.7) 

\i>q J 
Further, by the definition of /„, triangle inequality, and union bound, 



Pf(Jn > ^2) < P/ sup sup 



V^\fn{x,l) - fnix,l')\ 



>l*x€X an{x,l) + an{x,l') 



> qCni^n) 



< 2P/ sup sup — — > gc„(7„) 

\l>l*x(iX CFn{X,l) 



28 CHERNOZHUKOV, CHETVERIKOV, AND KATO 

Using the definition of A„j(/) and applying triangle inequality once again, 
the probability in the last expression can be further bounded from above by 

p ( VE\Ux,l)-Ef[Ux,l)]\ , ^ \ 

P/ sup sup — h ^nj{l) > qCn[ln) 

\l>llx£X 0-n{X,l) J 

<(i) P/ sup — > c„(7„) + 

\^>ev„ (yn{v) J 

<(2) In + (^Sn + Cn"^ <(3) Cn~'' 

where (1) follows from (|E.7p . (2) follows from the same arguments as those in 
the proof of Lemma [KTI and (3) holds because 7„ and S^n are both bounded 
by Cn~^. This gives the asserted claim. □ 

Now WG verify Conciition H4[ Note that for all f G J~ cLiid. I G /^n? 

c„(7„) sup an{x, I) < c„(7rt)(l + esn) sup an{x, I) < Cc„(7„)2''^/^ 
xgx xex 

with probability at least 1 — where the first inequality follows from 
Condition and the second inequality holds by Condition H2j Therefore, 
with the same probability, I* satisfies 

^2-'*^'+''/^^ <CcM. (E.8) 
Hence, with probability at least 1 — — Ai, 

A„j(/„) <(i) — ^ ^ <(2) — 

"^ ^l-63n)c22W -(4) 

<(5) CV^2-('^-™)(*+'^/2) <(6) CcM 

where (1) follows from Condition 1(31 (2) is by Condition (3) is by 
Condition LO (4) holds because e^n < 1/2, (5) is by Lemma [E.ll and (6) 
follows from (1E.8|1 . This completes the verification of Condition HH 

Next, we verify Condition Irl6l Note that with probability at least 1 — ^sn? 
for all I £ Cn, 

sup (T„(x, /)>(!- esn) sup anj{x, /)>(!- e3„)c22''^/^. 

xex xex 

In addition, by construction, Cn{^n) is the (1 — 7„)-quantile of the maximum 
over a set of A^(0, 1) random variables. Since 7„ < Csn"'^^, this implies that 
Cn(7n) > c-v/Iogn for some constant c > 0. Therefore, with probability at 
least 1 — dsn, 

So, by Lemma lE.21 with probability at least 1 — — A2, 

2"'"*A/n > cyiogn2'""'/^ 
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Conclude that with the same probability 

sup 0-„(x, In) < (1 + £3n) sup (Tnj{x^ 



-d/(2t(/)+d) 



Since Js^ + A2 < Cn Condition HSl follows. 

Finally, we verify Condition H5l Let M(j;") denote the median of = 
1 1 Gn {xi , ?i ) 1 1 v„ • Applying Theorem lA.5l conditional on the data gives Cni'^n) < 
M(X") + -y/2|log%J. Further, in the proof of Proposition 15. 11 it was shown 
that there exists a measurable set Sn,Q in M"'^ such that P/(Xf ^ Snfl) < 
Cn~'^ and for each G 5^,0 one can construct a random variable such 
that P(|W'„(x^) - > Ci) < C2 for some Ci and (2 both bounded by Cn'" 
and W^j = IIGnjIlv^. Therefore, 

Pf{M{Xl') > Cnj{l/2 + C2) + Ci) < Cn-'' (E.9) 

with probability at least 1 — Cn~''. Since E[||G„j||v„] < Cl^/logn (assumed 
in Condition HT]), Markov's inequality implies that c„j(l/2+(^2) < C^/Iogn. 
Combining this inequality with (IE.9P gives Condition H5l This completes 
the proof of the proposition. □ 
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